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In their article I Wibral et alj (120131 ). the authors propose a measure of inter- 
action delays rooted in an information-theoretic framework. Their measure, 
named TEspo where SPO stands for self-prediction optimality, is a time- 
delayed extension to transfer entropy that becomes maximal only at the actual 
interaction delay as proven in their paper. 

Wibral et a l. only consi d ered th e bivariate coupling case. Prior to their 
publication, in iRunge et al.l (|2012bl lal) the theory of detecting and quantify- 
ing causal interactions and their strength for the general multivariate case is 
discussed. However, Wibral et al. present an interesting complementary ap- 
proach in that they propose to determine the interaction delay based on the 
reconstructed (vector- valued) states rather than the (scalar) observations of the 
complex systems under study. 

Wibral et al. contrast their measure with a similar information-theoretic 
measure, the momentary information transfer (MIT), that was introduced in 
Pompe and Rungd (J201lh . Wibral et al. write that "A major conceptual differ- 
ence between the Pompe and Runge study and ours is that no formal proof of 
the maximality of their functional MIT at the correct interaction delay is given, 
and as we argue below cannot be given^ Rather than providing a proof that the 
maximality of MIT cannot be given, they construct a model example for which 
they find the MIT unable to serve the purpose of inferring the interaction delay. 
But, as shown below, the reasoning in their "maximality" -proof equally applies 
to the MIT and can, thus, not be used to disregard MIT. Rather, their example 
model seems to not fulfill the assumptions implicitly used in their own proof. 

The "formal proof of the maximal ity" of MIT was actually given in sub- 
sequent works in iRunge et al. ( 2012blla r) . Th e latter works further developed 
the original idea of IPompe and Rungd (J201l[ ) in that a two-step approach is 
proposed. In the first step the causal graph (conditional independence graph) 
is reconstructed (jRunge et al.l . l2012bl ) using the well-established framework of 
graphical models, where the property to capture the correct causal interac- 
tion delays i s a triv ial consequence of separation properties in the graph. In 
Runge et al.l ( 2012bl ) also the underlying assumptions for such an inference are 



given. In the second step the MIT is used as a measure of the couphng strength 
solely of the causal hnks, i.e., the inferred interaction delays, in this graph 



(jRungeet al.l . l2012al) 



Wibral at al. also have overseen that their estimator of con ditional mutual 
inform ation given by their Eq. (19) has already been developed in lFrenzel and Pompd 
( 20071 ). The latter work also discussed the inference of interaction delays. 



Proof that MIT is able to infer the correct inter- 
action delay 

Wibral et al. contrast their measure with the above-me ntioned information- 
theoretic quantity MIT, that has been thorough ly studied in Runge et al.l(l2012al) 
and was introduced in the older publication in IPompe and Rungel ( 20111 ) . It is 
defined as 



rMIT 



(r) = I{Xt-r;Yt\VYA{Xt-r},Vx,^^). 



(1) 



i.e., additionally to ITY, the parents of X a re included in the co ndition. Why 
this is reasonable is extensively discussed in iRunge et al.l (|2012al ). The discus- 



sion m 



Runge et al.l ( 2012al) focused on the point of what a reasonable measure 



of the interaction strength is, given that the interaction delays are a lready known 



since t hey have been inferred using the algorithm introduced in \Runoe et al 

{MM)- 

Wibral et al. prove that their TEspo is maximal only for the interaction 
delay. To this end they use the Markov properties of the process and infor- 
mation theoretic properties of the condit ional mutual information h ke the data 
processing inequality and the chain rule ( Cover and Thomasl 120061 ) . 

To show that this reasoning equally applies to MIT, we follow their proof 
and add the modifications for MIT in red, i.e., the additional condition on the 
parents of the lagged X, written as Vxt^s-n- Thus, their proof is obtained by 
leaving out the red variables. For a multivariate process {X, Y) with conditional 
independencies given by the graph shown in their Figure 2, we apply the chain 
rule for conditional mutual information twice (^ 7^ 0): 

= I(YuXt-5-i\Yt-i,rx,_s-,) + I{yuXt-s\Yt-i,Xt^s-i,Vx,_,_,) (2) 

" V ' 

>0 due to non-negativity of CMI 

= I(YuXt-s\Yt-i,Vx,^s-,)+ I{Yt;Xt-s-i\Yt-uXt^s.rx,^s-,) 

—0 due to (i-scparation implying conditional independence 

(3) 
Thus, 



I{YuXt-s-^\Yt^i.Vx,^,^,) < I{Yt;Xt-s\Yt-i,rx,^s-,), 



(4) 



showing that Wibral's theorem appUes to both their TEspo and MIT. If their 
example model contradicts this property, it simply means that their example 
model does not fulfill the very basic assumptions used in this proof. Possibly 
the notion of d-scparation. Unfortunately, Wibral et al. do not give a derivation 
of t heir analytica l compu tation of MIT for their model. In Runge et al.l ( 2012bl ) 



and lRunge et al.l (l2012al ) we refer to the condition (S) in lEichleiJ (J2012t ) that a 
process must fulfill to guarantee that d-separation in the causal graph implies 
conditional independence in the process. The model actually features a deter- 
ministic coupling from X toY and randomness only in X. If even this remaining 
randomness is set to very small values, only then does the "inability" of MIT ap- 
pear. Apart from the question how realistic a purely deterministic dependence is 
in real applications, generally, information theory is p rimarily suited for stochas - 



tic processes. As discussed also in the original paper (jPompe and Rungd . 120111 ). 
deterministic processes generate the source entropy that MIT is based on only 
in the chaotic case due to quantization effects. 
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